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Abstract — We propose an optimal control framework for 
persistent monitoring problems where the objective is to control 
the movement of mobile agents to minimize an uncertainty 
metric in a given mission space. For a single agent in a one- 
dimensional space, we show that the optimal solution is obtained 
in terms of a sequence of switching locations, thus reducing 
it to a parametric optimization problem. Using Infinitesimal 
Perturbation Analysis (IPA) we obtain a complete solution 
through a gradient-based algorithm. We also discuss a receding 
horizon controller which is capable of obtaining a near-optimal 
solution on-the-fly. We illustrate our approach with numerical 
examples. 

I. Introduction 

Enabled by recent technological advances, the deployment 
of autonomous agents that can cooperatively perform com- 
plex tasks is rapidly becoming a reality. In particular, there 
has been considerable progress reported in the literature on 
sensor networks that can carry out coverage control [6], [13], 
[17], surveillance [10], [11] and environmental sampling 
[15], [19] missions. In this paper, we are interested in gen- 
erating optimal control strategies for persistent monitoring 
tasks; these arise when agents must monitor a dynamically 
changing environment which cannot be fully covered by a 
stationary team of available agents. Persistent monitoring 
differs from traditional coverage tasks due to the perpetual 
need to cover a changing environment, i.e., all areas of the 
mission space must be visited infinitely often. The main 
challenge in designing control strategies in this case is in 
balancing the presence of agents in the changing environment 
so that it is optimally covered over time while still satisfy- 
ing sensing and motion constraints. Examples of persistent 
monitoring missions include surveillance in a museum to 
prevent unexpected events or thefts, unmanned vehicles for 
border patrol missions, and environmental applications where 
routine sampling of an area is involved. 

In this paper, we address the persistent monitoring problem 
through an optimal control framework to drive agents so as 
to minimize a metric of uncertainty over the environment. In 
coverage control [6], [13], it is common to model knowledge 
of the environment as a non-negative density function defined 
over the mission space, and usually assumed to be fixed 
over time. However, since persistent monitoring tasks involve 
dynamically changing environments, it is natural to extend 
it to a function of both space and time to model uncertainty 
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in the environment. We assume that uncertainty at a point 
grows in time if it is not covered by any agent sensors; 
for simplicity, we assume this growth is linear. To model 
sensor coverage, we define a probability of detecting events 
at each point of the mission space by agent sensors. Thus, 
the uncertainty of the environment decreases (for simplicity, 
linearly) with a rate proportional to the event detection 
probability, i.e., the higher the sensing effectiveness is, the 
faster the uncertainty is reduced.. 

While it is desirable to track the value of uncertainty over 
all points in the environment, this is generally infeasible 
due to computational complexity and memory constraints. 
Motivated by polling models in queueing theory, e.g., spatial 
queueing [1], [5], and by stochastic flow models [21], we 
assign sampling points of the environment to be monitored 
persistently (equivalently, we partition the environment into 
a discrete set of regions). We associate to these points 
"uncertainty queues" which are visited by one or more 
servers. The growth in uncertainty at a sampling point can 
then be viewed as a flow into a queue, and the reduction in 
uncertainty (when covered by an agent) can be viewed as the 
queue being visited by mobile servers as in a polling system. 
Moreover, the service flow rates depend on the distance of 
the sampling point to nearby agents. From this point of view, 
we aim to control the movement of the servers (agents) so 
that the total accumulated "uncertainty queue" content is 
minimized. 

Control and motion planning for agents performing per- 
sistent monitoring tasks have been studied in the literature. 
In [17] the focus is on sweep coverage problems, where 
agents are controlled to sweep an area. In [14], [20] a 
similar metric of uncertainty is used to model knowledge 
of a dynamic environment. In [14], the sampling points in a 
1-D environment are denoted as cells, and the optimal control 
policy for a two-cell problem is given. Problems with more 
than two cells are addressed by a heuristic policy. In [20], the 
authors proposed a stabilizing speed controller for a single 
agent so that the accumulated uncertainty over a set of points 
along a given path in the environment is bounded, and an 
optimal controller that minimizes the maximum steady-state 
uncertainty over points of interest, assuming that the agent 
travels along a closed path and does not change direction. 
The persistent monitoring problem is also related to robot 
patrol problems, where a team of robots are required to visit 
points in the workspace with frequency constraints [8], [9], 
[12]. 

Our ultimate goal is to optimally control a team of coop- 
erating agents in a 2 or 3-D environment. The contribution 
of this paper is to take a first step toward this goal by 



formulating and solving an optimal control problem for one 
agent moving in a 1-D mission space in which we minimize 
the accumulated uncertainty over a given time horizon and 
over an arbitrary number of sampling points. Even in this 
simple case, determining a complete explicit solution is 
computationally hard. However, we show that the optimal 
trajectory of the agent is to oscillate in the mission space: 
move at full speed, then switch direction before reaching 
either end point. Thus, we show that the solution is reduced 
to a parametric optimization problem over the switching 
points for such a trajectory. We then use generalized In- 
finitesimal Perturbation Analysis (IPA) [4], [22] to determine 
these optimal switching locations, which fully characterize 
the optimal control for the agent. This establishes the basis 
for extending this approach, first to multiple agents and then 
to a 2-dimensional mission space. It also provides insights 
that motivate the use of a receding horizon approach for 
bypassing the computational complexity limiting real-time 
control actions. These next steps are the subject of ongoing 
research. 

The rest of the paper is organized as follows. Section \H\ 
formulates the optimal control problem. Section III charac- 
terizes the solution of the optimal control problem in terms 
of switching points in the mission space, and includes IPA in 
conjunction with a gradient-based algorithm to compute the 
sequence of optimal switching locations. Section |TV|provides 
some numerical results. Section [V] discusses extensions of 
this result to a receding horizon framework and to multiple 



agents. Section VI concludes the paper. 



II. Persistent Monitoring Problem Formulation 

We consider a mobile agent in a 1 -dimensional mission 
space of length L. Let the position of the agent be s(t) E [0,L] 
with dynamics: 



s(t)=u(t), s(0)=0 



(1) 



i.e., we assume that the agent can control its direction and 
speed. We assume that the speed is constrained by \u(t) \ < 1. 

We associate with every point x E [0,L] a function p(x, s) at 
state s(t) that captures the probability of detecting an event at 
this point. We assume that p(x,s) = 1 if x = s, and that p(x,s) 
decays when the distance between x and s (i.e., \x — s\) 
increases. Assuming a finite sensing range r, we set p(x,s) = 
when \x — s\ > r. In this paper, we use a linear decay model 
shown below as our event detection probability function: 



p(x,s) = 



if \x- 
if \x- 



< r 
> r 



(2) 



We consider a set of points {a,}, i = 1,... ,M, a, e [0,L], 
and associate a time-varying measure of uncertainty with 
each point 05,-, which we denote by /?,-(?). Without loss 
of generality, we assume < OL\ < • • • < OLm < L and, to 
simplify notation, we set pi(s(t)) = p((Xi,s(t)). This set may 
be selected to contain points of interest in the environment, 
or sampled points from the mission space. Alternatively, we 
may consider a partition of [0,L] into M intervals whose 
center points are a, = (2/ — 1)L/2M, i = 1,. . . ,M. We can 



Ri(t) 



then set p(x,s) = pj(s) for all x E [a, — a,- + Jjg]. The 
uncertainty functions Rj(t) are defined to have the following 
properties: (?) Rj(t) increases with a fixed rate dependent on 
a,, if pi(s(t)) — 0, (ii) Ri(t) decreases with a fixed rate if 
Pi(s(t)) = 1, and (Hi) Ri(t) > for all t. It is then natural 
to model uncertainty so that its decrease is proportional 
to the probability of detection. In particular, we model the 
dynamics of Ri(t), i= 1, . . . ,M, as follows: 

if Ri{t) = 0, Ai < Bpi(s(t)) 

Ai—Bpi(s(t)) otherwise 

(3) 

where we assume that initial conditions /?,(0), i = 1, . . . ,M, 
are given and that B > A,- > for all i (thus, the uncertainty 
strictly decreases when s(t) = 0£ ( ). 

Viewing persistent monitoring as a polling system, each 
point a, (equivalently, zth interval in [0,L]) is associated with 
a "virtual queue" where uncertainty accumulates with inflow 
rate A,-. The service rate of this queue is time-varying and 
given by Bpi(s(t)), controllable through the agent position at 
time t, as shown in Fig. [T] This interpretation is convenient 
for characterizing the stability of this system: For each queue, 
we may require that A,- < j Jq Bpj(s(t))dt. Alternatively, we 
may require that each queue becomes empty at least once 
over [0,T], We may also impose conditions such as Ri(T) < 
Rmax for each queue as additional constraints for our problem 
so as to provide bounded uncertainty guarantees, although we 
will not do so in this paper. Note that this analogy readily 
extends to multi-agent and 2 or 3-D settings. Also, note that 
B can also be made location dependent without affect the 
analysis in this paper. 



Ai ... A. 
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Bp,(s(t)) 

Fig. 1. A queueing system analog of the persistent monitoring problem. 

The goal of the optimal persistent monitoring problem we 
consider is to control the mobile agent direction and speed 
u(t) so that the cumulative uncertainty over all sensor points 
{«,}, i = 1, . . . ,M is minimized over a fixed time horizon T. 
Thus, we aim to solve the following optimal control problem: 



1 



T M 



Problem PI: min J = I TRj(t)dt (4) 

u(t) T Jo g 

subject to the agent dynamics ([T]), uncertainty dynamics (I3J, 
state constraint < s(t) < L, t E [0, T], and control constraint 
\u(t)\<l,te[0,T]. 

III. Optimal Control Solution 

In this section we first characterize the optimal control 
solution of Problem PI and show that it is reduced to a 



parametric optimization problem. This allows us to utilize 
the IPA method [4] to find a complete optimal solution. 

A. Hamiltonian analysis 

We define the state vector x (?) = [s (?) ,R\(t),.. . ,/?m(?)] t 
and the associated costate vector A (?) = 
[As (f) ,Ai(f), . . . ,Am(0] T - I n view of the discontinuity 
in the dynamics of /?,•(?) in {3j, the optimal state trajectory 
may contain a boundary arc when Ri(t) — for any z; 
otherwise, the state evolves in an interior arc. We first 
analyze the system operating in such an interior arc. Due to 
([TJ and ((3), the Hamiltonian is: 

M M 

H{x,X,u) = Y< Ri (0 + K (t ) u (?) + £ A (? ) (A; - Bp, (a) ) 

(=1 1=1 

(5) 

and the costate equations A = — ^ are: 

<7S *~j (75 



A;(?) 



dH 



dR, 



= -1, 



z = 1,...,M, 



(6) 



where we have used and the sets F (?) and F + (t) are 
defined as: 

F _ (f) = {i:s(t)-r< a, < s(?)} 
F+(?) = {?:*(?) <(Xi<s(t)+r}, 

so that they identify all points a, within the agent's sensing 
range. Since we impose no terminal state constraints, the 
boundary conditions are X S (T) = and A (r) =0, i — 
1,...,M. Applying the Pontryagin minimum principle to |5]) 
with M*(f), t E [0,T), denoting an optimal control, we have 

H (jc*,A*,u*) = min H{x,X,u) 

«e[-i,i] 

and it is immediately obvious that it is necessary for an 
optimal control to satisfy: 



u*(t) = 



1 if A, (f) < 
-1 ifA J (f)>0 



(7) 



This condition excludes the case where A s (?) = over some 
finite "singular intervals" [2]. It turns out this can arise only 
in some pathological cases which we shall not discuss in this 
paper. 

The implication of ^ with X, -(T) = is that X, (? ) = 
T -t for all t £ [0,T] and all i = 1,...,M and A,-(?) is 
monotonically decreasing starting with A, (0) = T. However, 
this is only true if the entire optimal trajectory is an interior 
arc, i.e., the state constraints remain inactive. On the other 
hand, looking at |6|, observe that when the two end points, 
and L, are not within the range of the agent, we have 
|F~(?)| = |F + (?)|, since the number of indices i satisfying 
s (?) — r < a,- < s (?) is the same as that satisfying s (?) < a,- < 
s (t)+r. Consequently, X s (t) = 0, i.e., X s (?) remains constant 



as long as this condition is satisfied and, in addition, none 
of the state constraints /?,-(?) > 0, i = 1,... ,M, is active. 

Thus, as long as the optimal trajectory is an interior arc and 
As (?) < 0, the agent moves at maximal speed u* (f ) = 1 in the 
positive direction towards the point s = L. If X s (t) switches 
sign before any of the state constraints /?/(?) > 0, i= 1,. . . ,M, 
becomes active or the agent reaches the end point s = L, then 
u* (?) = —1 and the agent reverses its direction or, possibly, 
comes to rest. In what follows, we examine the effect of the 
state constraints and will establish the fact that the complete 
solution of this problem boils down to determining a set 
of switching locations over (0,L) with the end points being 
infeasible on an optimal trajectory. 

The dynamics in ([3]) indicate a discontinuity arising when 
the condition Ri(t) = is satisfied while &i(t) = A, — 
Bpi(s(t)) < for some i = 1, . . . ,M. Thus, Rj = defines an 
interior boundary condition which is not an explicit function 
of time. Following standard optimal control analysis [2], if 
this condition is satisfied at time ? for some k G {1. . . . ,M}, 

H(x(r),X{r),u{r)) =#(*(?+), A (?+),«(?+)) (8) 

where we note that one can make a choice of setting the 
Hamiltonian to be continuous at the entry point of a boundary 
arc or at the exit point. Using |5]l and ([3]), ([H) implies: 

A/ (?-) u* (?-) +A,* (?-) (A.-BpMr))) = K (t + ) u* (t + ) 

(9) 

In addition, A* (?~) = A* (?+) and Af (?~) = A* (?+) for all 
i ^ k, but A^ (?) may experience a discontinuity so that: 



= A,*(? + )^ 



(10) 



where 71% > 0. Recalling (|7j, since A s * (?) remains unaffected, 
so does the optimal control, i.e., «*(? _ ) = «*(?+). Moreover, 
since this is an entry point of a boundary arc, it follows 
from J3) that R k (r) = A; - Bp k (s(r)) < 0. Therefore, ^ 
and ( |10| imply that 

tf(r)=o, Xt{t + )=n k >0, 

The actual evaluation of the costate vector over the in- 
terval [0, T] requires solving (|6]), which in turn involves the 
determination of all points where the state variables /?;(?) 
reach their minimum feasible values /?;(?) =0, i = l,...,M. 
This generally involves the solution of a two-point-boundary- 
value problem. However, our analysis thus far has already 
established the structure of the optimal control (j7]i which we 
have seen to remain unaffected by the presence of boundary 
arcs where /?,-(?) = for one or more i = 1, . . . ,M, Let us now 
turn our attention to the constraints s(?) > and s(t) < L. 
The following proposition asserts that neither of these can 
become active on an optimal trajectory. 

Proposition 1: On an optimal trajectory, 5* (?) ^ and 
s* (?) ^ L for all ? e (0, T] . 

Proof: Suppose s(t) > becomes active at some ? € 
(0,r). In this case, X i (r) = X l (?+) for all i = l,...,M, but 
X s (?) may experience a discontinuity so that 

A. s (?~) =A S (?+) -n 



where 7Cq > is a scalar constant. Since the constraint 5 = 
is not an explicit function of time, <|8j holds and, using Q, 
we get 

x;(r)u*(r)=A:(t+)u*(t + ) (11) 

Clearly, as the agent approaches s = at time f, we must have 
s*(t ~) = «*(r ~) < and, from A* (f~) > 0. It follows that 
A* (f+) = A*(f~) > 0. On the other hand, u* (f+) > 0, 
since the agent must either come to rest or reverse its motion 
0, hence X* (t + ) u* (t + ) > 0. From (Hi, this contradicts 



at 5 

the fact that X* (t~)u* (t ~) < and we conclude that s*(t) 
can not occur. By the exact same argument, s* (t) — L also 
cannot occur. ■ 

Based on this analysis, the optimal control in (j7]i depends 
entirely on the points where X s (t) switches sign and, in 
light of Prop. [T] the solution of the problem reduces to the 
determination of a parameter vector 9 = [9\ , . . . , 9n] t , where 
6j £ (0,L) denotes the jth location where the optimal control 
changes sign. Note that N is generally not known a priori 
and depends on the time horizon T. 

Since 5(0) = 0, from Prop. [T] we have u*(0) — 1, thus 9\ 
corresponds to the optimal control switching from 1 to —1. 
Furthermore, 9j,j odd, always correspond to u*(t) switching 
from 1 to — 1, and vice versa if j is even. Thus, we have 
the following constraints on the switching locations for all 
./ 2 V: 



6j < dj-i, if j is even 
9j > dj-i, if j is odd. 



(12) 



It is now clear that the behavior of the agent under the 
optimal control policy (|7]i is that of a hybrid system whose 
dynamics undergo switches when u* (t ) changes between 
1 and —1 or when R,(t) reaches or leaves the boundary 
value Rj = 0. As a result, we are faced with a parametric 
optimization problem for a system with hybrid dynamics. 
This is a setting where one can apply the generalized theory 
of Infinitesimal Perturbation Analysis (IPA) in [4], [22] to 
obtain the gradient of the objective function J in Q with 
respect to the vector and, therefore, determine an optimal 
vector 9* through a gradient-based optimization approach. 

Remark 1: If the agent dynamics are replaced by a model 
such as 5(f) = g(s) + bu(t), observe that (j7|i still holds, 
as does Prop. [T] The only difference lies in (|6]l which 
would involve a dependence on and further complicate 
the associated two-point-boundary-value problem. However, 
since the optimal solution is also defined by a parameter 
vector 9 — [61,..., 9n] t , we can still apply the IPA approach 
presented in the next section. 

B. Infinitesimal Perturbation Analysis (IPA) 

Our analysis has shown that, for an optimal trajectory, 
the agent always moves at full speed and never reaches 
either boundary point, i.e., < s*(t) < L (excluding certain 
pathological cases as mentioned earlier.) Thus, the agent's 
movement can be parametrized through 9 — [9\,... , 9n] t 
where 0, is the 2th control switching point and the solution 
of Problem PI reduces to the determination of an optimal 



parameter vector 9*. As we pointed out, the agent's behavior 
on an optimal trajectory defines a hybrid system, and the 
switching locations translate to switching times between 
particular modes of the hybrid system. Hence, this is similar 
to switching-time optimization problems, e.g., [7], [18], [23] 
except that we can only control a subset of mode switching 
times. 

To describe an IPA treatment of the problem, we first 
present the hybrid automaton model corresponding to the 
system operating on an optimal trajectory. 

Hybrid automaton model. We use a standard definition 
of a hybrid automaton (e.g., see [3]) as the formalism to 
model such a system. Thus, let q £ Q (a countable set) 
denote the discrete state (or mode) and x £ X C R" denote 
the continuous state. Let u £ T (a countable set) denote a 
discrete control input and u£U C W a continuous control 
input. Similarly, let 8 £ A (a countable set) denote a discrete 
disturbance input and d £ D C M. p a continuous disturbance 
input. The state evolution is determined by means of (i) a 
vector field f: QxXxUxD^-X, (ii) an invariant (or 
domain) set Inv : Q x T x A — > 2 X , (Hi) a guard set Guard : 
<2 x g x Y x A — s- 2 Z , and (z'v) a reset function r : Q x Q x X x 
T x A — > X. The system remains at a discrete state q as long 
as the continuous (time-driven) state x does not leave the set 
Inv(q,V,8). If x reaches a set Guard(q,q' ,V,8) for some 
4 £ Q, a discrete transition can take place. If this transition 
does take place, the state instantaneously resets to (q',x ! ) 
where x' is determined by the reset map r(q,q' ,x,v,8). 
Changes in v and 8 are discrete events that either enable a 
transition from q to q' by making sure x £ Guard(q,q' ,V,8) 
or force a transition out of q by making sure x ^ Inv(q,v,8). 
We will classify all events that cause discrete state transitions 
in a manner that suits the purposes of IPA. Since our problem 
is set in a deterministic framework, 8 and d will not be used. 

We show in Fig. [2] a partial hybrid automaton model of 
the system: due to the size of the overall model, Fig. [2] 
is limited to the behavior of the agent with respect to a 
single a.j,i E {1,...,M}. The model consists of 14 discrete 
states (modes) and is symmetric in the sense that states 1—7 
correspond to the agent operating with u(t) = 1, and states 
8 — 14 correspond to the agent operating with u(t) = —I. 
The events that cause state transitions can be placed in three 
categories: (/) The value of /?,•(?) becomes and triggers a 
switch in the dynamics of ([3]). This can only happen when 
Ri(t) > and Rt(t) = A; - Bpi(s(t)) < (e.g., in states 3 
and 4), causing a transition to state 7 in which the invariant 
condition is Ri(t) — 0. (ii) The agent reaches a switching 
location, indicated by the guard condition s(t) = 9j for any 
j = 1, . . . ,N. In these cases, a transition results from a state q 
to<7 + 7if<7=l,...,6 and to q — 7 otherwise. (Hi) The agent 
position reaches one of several critical values that affect 
the dynamics of Ri(t) while Ri(t) > 0. When s(t) = a,- — r, 
the value of pi(s(t)) becomes strictly positive and Ri(t) = 
Aj —Bpi(s(t)) > 0, as in the transition 1—^2. Subsequently, 
when 5(f) = a; — r(l —Ai/B), as in the transition 2^3, 
the value of pi(s(t)) becomes sufficiently large to cause 
Ri(t) =Ai—Bpi(s(t)) < so that a transition due to Rj(t) = 



H(<)=/ 2 ,i(0=l 
[j3 2 <s<a,+r,S t >0] 



4(0=/,i(*)=l 
a l -r<s<j3,,R i >0] 



s= 61. 




«i(O=/ 2 ,i(0=-l 
<s <a, + r,i^ . >0] 



^(/)=/,i(f) = -l 

[tt,-/-<J<$, fl, >0] 



Fig. 2. Hybrid automaton for each a,. Red arrows represent events when the control switches between 1 and — 1. Blue arrows represent events when S, 
becomes 0. Black arrows represent all other events. 



becomes feasible at this state. Similar transitions occur when 
s(t) = a,-, s(t) = a, + r(l -Ai/B), and s(t) = a,- + r. The latter 
results in state 6 where Rj(t) = A, > and the only feasible 
event is s(t) = 9j, j odd, when a switch must occur and a 
transition to state 13 takes place (similarly for state 8). 

IPA review. Before proceeding, we provide a brief review 
of the IPA framework for general stochastic hybrid systems 
as presented in [4]. In our case, the system is deterministic, 
offering several simplifications. The purpose of IPA is to 
study the behavior of a hybrid system state as a function 
of a parameter vector £ for a given compact, convex 
set © C K z . Let {^(0)}, k = 1, . . . ,K, denote the occurrence 
times of all events in the state trajectory. For convenience, we 
set To = and %+i = T. Over an interval [t k (d),t k+ i(d)), 
the system is at some mode during which the time-driven 
state satisfies i = f k (x, 9,t). An event at x k is classified as (i) 
Exogenous if it causes a discrete state transition independent 
of 9 and satisfies ^ = 0; (ii) Endogenous, if there exists 
a continuously differentiable function g k : R" x © — > R such 
that Tjc = min{? > z k _\ : g k (x(0,f) , 9) = 0}; and (j'z'z) 
Induced if it is triggered by the occurrence of another event at 
time X m < Tfc. Since the system considered in this paper does 
not include induced events, we will limit ourselves to the first 
two event types. IPA specifies how changes in 9 influence 
the state x(9,t) and the event times T k (9) and, ultimately, 
how they influence interesting performance metrics which 
are generally expressed in terms of these variables. 

Given 9 = [9\ , . . . , 9n] t , we use the notation for Jacobian 



matrices: x'(t) 



dx(0,t) 



98 ' K 



l,...,K, for all 



de ' u 

state and event time derivatives. It is shown in [4] that xf (t ) 
satisfies: 

d l(A df k (t) , df k (t) 

dt X{t) = ^x- x{t) + ^9- (13) 



for t E [T k ,X k +i) with boundary condition: 

x'(T+) = x'(T k -)+ [/ t _i(T t ") "/*«)] < (14) 

for k — Q,...,K. In addition, in ([14]), the gradient vector for 
each x k is z k = if the event at T,t is exogenous and 



dgi 



1 'dg k dg k , 



(15) 



if the event at x k is endogenous and defined as long as 

%Ate)*o. 

IPA equations. To clarify the presentation, we first note 
that i = l,...,M is used to index the points where un- 
certainty is measured; j = l,...,N indexes the compo- 
nents of the parameter vector; and k — 1 , . . . , K indexes 
event times. In order to apply the three fundamental IPA 
equations (|T3]>-(|T~5|> to our system, we use the state vector 
x(t) = [s (f) ,R\ (t ), . . . ,Rni(t )] T and parameter vector 9 — 
[01 , . . . , 0jv] T . We then identify all events that can occur in 
Fig. [2] and consider intervals [T k (9) 1 T k +i(9)) over which 
the system is in one of the 14 states shown for each 
i = 1,...,M. Applying (jT^J to s(t) with f k (t) = 1 or -1 
due to ([T]) and (|7]), the solution yields the gradient vector 
V*(0 = [^W,...,^(0] 1 . where 



ds 

do, 



ds 

( t ) = jQ.te)> forte[T k ,T k+1 ) 



(16) 



for all k= l,...,K, i.e., for all states q(t) G {1,...,14}. 
Similarly, let VR^t) = [fg-(f), • ■ ■ , ^(t)V i=l,... ,M. 
We note from ^ that f k (f ) = for states q(t) eQi={7, 14}; 
f k (t) =Ai for states q(t) e Q2 = {1,6,8,13}; and f k (t) = 
Aj — Bpi(s(t)) for all other states which we further classify 



into Q 3 = {2,3,11,12} and Q 4 
( 13 i and using ( [To*} gives: 



{4,5,9,10}. Thus, solving 



ds 



where 




if 9(0e(2iUQ2 

otherwise 



±i as evaluated from depending on the 
sit) at each automaton state. 



sign of a,- 

We now turn our attention to the determination of Vs (t^~) 
and VRi(T^) from ( 14 1, which involves the event time gradi- 



ent vectors Vt^ 



] for k = 1, . . . ,K. Looking at 



Fig.|2j there are three readily distinguishable cases regarding 
the events that cause state transitions: 

Case 1: An event at time Tj which is neither = nor 
s = 0j, for any 7 = 1,... ,N. In this case, it is easy to see 
that the dynamics of both s(t) and Rj(t) are continuous, so 
that A-i(t^) = in (l4]) applied to s(t) and /?,(*), 

/ = 1 , . . . ,M and we get 



Vs(t+ 



= v/?i(^), i 



1,...,M 



(17) 



Case 2: An event Rj = at time Ta> This corresponds 
to transitions 3 -> 7, 4 -> 7, 10 -> 14 and 11 ->• 14 in Fig. 
[2] where the dynamics of s(f ) are still continuous, but the 
dynamics of Rj(t) switch from fk-i{%7) = A; — Bpi(s(xj~)) 
to A(t^) = 0. Thus, Vs(t^) = Vs(t£), but we need to 
evaluate x' k to determine VRj(xt). Observing that this event 
is endogenous, ( fT3] l applies with g k =Rj = and we get 



d9i 



de'j ( T k ) 
'Ai-Bp^)) 



7 = 1, 



*=1, 



It follows from ([T4} that 



5^ . dRi 

ddj 1 * ; ddj 



X,, - 



Ai-B Pi {s(tr)) 



,K 



= 



Thus, 37^ (t^~) is always reset to regardless of If (r t ). 

Case 5: An event at time x k due to a control sign change 
at s = 6j, j = l,...,N. This corresponds to any transition 
between the upper and lower part of the hybrid automaton in 
Pig. [2] In this case, the dynamics of Ri(t) are continuous and 
\X. J for all i, j,k. On the other hand, 

= — u(xr) = ±1. Observing that any 
such event is endogenous, ( 15 1 applies with gk = s — 9j = 
for some 7 



wehaveff (t+) 
we have s(x^) = 



_ dRi 



l,,..,N and we get 



dx k 
d6j 



1- 



) 



(18) 



Combining (I81 with (14i and recalling that that u(x k 
—u{xZ), we have 



ds_ 



d9j 



(h ) + [«( T J-"( T t + )] 



1 



ds 



<\ ) 



aej(0)=0=j%(f)forall f e 



where (x k ) = because 

[0, since the position of the agent cannot be affected by 
0j prior to this event. 

Now, let us consider the effect of perturbations to ft, for 
n < j, i.e., prior to the current event time x k . In this case, we 
have = and |l5| becomes 



dZk 

d9„ 



«K ) 



so that using this in ( 14 1 gives: 



de, 



de„ 



59„ 



5 ft, 



Combining the above results, the components of Vs(t^) 
where Tjt is the event time when s(t^) = 0, for some 7, are 
given by 



5ft 



2 




if « = 1 , 
if n = j 
if n = j 



1 



(19) 



1,...,AT 



It follows from (I61 and the analysis of all three cases 



above that 4i- (f ) for all 7 is constant throughout an optimal 
trajectory except at transitions caused by control switching 
locations (Case 3). In particular, for the kth event correspond- 
ing to s(x k ) = 9j, t G [Tjt, T], if m (f ) = 1, then ^ (?) = -2 if 

7 is odd, and 4g- (t) — 2 if 7 is even; similarly, if u (f) = — 1, 



then 



el.v 

de-, 



d8j 

(f ) = 2 if 7 is odd and 4^ (f ) = —2 if 7 is even. In 



summary, we can write jjp (t) as 



^s 
39 



m= f (-I) 7 ' 2u(t) t 

y> 1 ? 



j=l,...,N (20) 



Finally, we can combine (20i with our results for ||i (f ) 
in all three cases above. Letting s(T/) = we obtain the 
following expression for |f (f) for all k > I, t G [ife, ifc+i): 



^(0 = 1^) 



50 



(21) 







+ (-i)^^ M ( T +).(r- Tfc ) 

-(-iy +1 f«(T+).(f-T,) 

with boundary condition 



if q(t)eQiUQ 2 
if o(0 G Q3 
if a(f) G Qa 



dRj 

de} 




if9(T+)G(2i 
otherwise 



(22) 



Objective Function Gradient Evaluation. Since we are 
ultimately interested in minimizing the objective function 
J(9) (now a function of 9 instead of u) in Q with respect 
to 0, we first rewrite: 



1 M K rik+i( e ) 
■ / ( )=rEE/ Ri{t,e)dt 



where we have explicitly indicated the dependence on 9. We 
then obtain: 



Two switching points: Agent position s(t) v 



V/(fl) 



= T E E / VR > ( f ) * +R i ^k+l) VT i+1 - R t {X k ) VT k 
1 i=\k=0 \ jT k 

Observing the cancellation of all terms of the form 
R, (t*) Vt* for all k, we finally get 

1 M N r x k+i 

S/J (°)=tLL VRi{t)dt. (23) 

1 i=\k=0 Jt k 

The evaluation of V/(0) therefore depends entirely on 



V/?,- (f), which is obtained from (21 1-(22 1 and the event times 



Tjt, k= 1 } ...,K, given initial conditions s (0) = 0, (0) for 
i=l,...,M and V/?,-(0) =0. 

Objective Function Optimization. We now seek to ob- 
tain 9* minimizing J (9) through a standard gradient-based 
optimization scheme of the form 



9 I+l = / -T7,V7(0') 



(24) 



where {rj;} is an appropriate step size sequence and V/(0) 
is the projection of the gradient VJ(9) onto the feasible set 
(the set of 9 satisfying the constraint (fT2"|>). The optimization 
scheme terminates when |V/(0)| < £ (for a fixed threshold 
e) for some 9. Our IPA -based algorithm to obtain 9* 
minimizing J(9) is summarized in Alg. [T] where we have 
adopted the Armijo step-size (see [16]) for {rj/}. 

Algorithm 1 : IPA-based optimization algorithm to find 9* 

1: Set N = (L'J is tne fl° or function), and set 9 = 

[01 , . . . , 9n] t satisfying constraint (T2| 
2: repeat 

3: Compute s(t), t e [0,T] using 9 

4: Compute V7(0) and update 9 through (f24| 

5: until |V/(0)| <e 

6: if 9 satisfies Prop. [T] then 

7: Stop, return as 9* 

8: else 

9: Set jV + 1 — > N and set 9 N = s(T) 
10: Go to Step 2 
ii: end if 

Recalling that the dimension N of 9* is unknown (it 
depends on T), a distinctive feature of Alg. [T] is that we 
vary N by possibly increasing it after a vector 9 locally 
minimizing J is obtained, if it does not satisfy the necessary 
optimality condition in Prop. [T] We start the search for a 
feasible N by setting it to |_t_|, the minimal N for which 9 
can satisfy Prop. [T] and only need to increase Af if the locally 
optimal 9 vector violates Prop. [T] 

It is possible to increase Af further after Alg. [T] stops, and 
obtain a local optimal 9 vector with a lower cost. This is 
due to possible non-convexity of the problem in terms of 9 
and N. In practice, this computation can take place in the 
background while the agent is in operation. Alternatively, 




Two switching points: Objective function J vs. iterations 



Ten switching points: Objective function J vs. iterations 



Fig. 3. Numerical results. Top figures correspond to L = 20, T = 36, 21 
sampling points in [0,L]. Bottom figures correspond to L = 100, T = 980, 
101 sampling points in [0,L]. Left plots: optimal trajectories. Right plots: / 
versus iterations. 



we can adapt a receding horizon formulation to compute the 
optimal control on-line. This approach is explained in more 
detail in Sec. [V] 

IV. Numerical results 

In this section we present two numerical examples where 
we have used Alg. [T] to obtain an optimal persistent moni- 
toring trajectory. The results are shown in Fig. [3] The top 
two figures correspond to an example with L = 20, M = 21, 
a,\ = 0, (Xm = 20, and the remaining sampling points are 
evenly spaced between each other. Moreover, A, = 0.01 for 
all i, B = 3,r = 4, /?,-(()) = 2 for all i and T = 36. We start the 
algorithm with 9 = [12] T and e = 2 x 10~ 10 . The algorithm 
stopped after 13 iterations (about 9 sec) using Armijo step- 
sizes, and the cost, J, was decreased from 16.63 to J* = 10.24 
with 9* = [17.81, 1.29] T , i.e., the dimension increased by 1. 
In the top-left, the optimal trajectory s*(t) is plotted; in the 
top-right, J is plotted against iterations. We also increased Af 
to 3 with initial 9 = [12, 16,4]; Alg. [T] converged to a local 
minimum J = 13.27 > J* = 10.24 under N = 2. 

The bottom two figures correspond to an example 
with L = 100, M = 101 and evenly spaced sampling 
points over [0,L], A, = 0.01 for all i, B = 3, r = 4, 
Ri{0) = 2 for all i and T = 980. We start the algorithm 
with N = 9, 6 = [95, 95, 95, 95, 95, 5, 5, 5, 5] T and 
same e. The algorithm stopped after 14 iterations 
(about 10 min, an indication of the rapid increase in 
computational complexity) using Armijo step-sizes, 
and J was decreased from 88.10 to J* = 70.49 with 9* — 
[98.03, 96.97, 96.65, 96.35, 95.70,2.94,3.21, 3.61, 4.08, 4.57] T 
where N = 10. Note that the cost is much higher in this 
case due to the larger number of sampling points. Moreover, 
none of the optimal switching locations is at or L, 
consistent with Prop. [T] We also increased AT to 11 with 
9 = [90,90,90,90,90,90, 10, 10, 10, 10, 10]; Alg. [T] converged 
to 101.56 > J* = 70.49 under N = 10. 



V. Extensions 



References 



In this section we briefly discuss extensions to a "myopic" 
Receding Horizon (RH) framework, or a setting with multi- 
ple agents. Our proposed uncertainty model can be directly 
used to solve the persistent monitoring problem with a RH 
approach by solving Problem PI not for the time horizon T, 
but for a smaller time window H, where H <T, repeatedly 
every time interval h < H. Because H is usually much smaller 
than T, and since the optimal control is shown to be "bang- 
bang" when not inside a singular arc, it can be assumed 
that the control is constant (denoted as u) during the horizon 
[t,t+H]. In this case, the problem of minimizing the cost 
function Q over u G [—1,1] is a scalar optimization problem 
and its solution can be obtained explicitly, given the initial 
conditions of s(t) and Rj(t). The RH controller operates 
as follows: at time f, the optimal control is computed for 
[t,t+H] and is used for the time interval [t,t+h]. This 
process is repeated every h units of time, until t = T . In 
our numerical examples, the cost obtained using the RH 
framework is very close to the optimal cost (consistently 
within 5%), and since an explicit solution is available, the 
optimal control can be computed quickly and in real-time. 
The RH framework can also accommodate situations where 
events are triggered in real-time at some sampling points; in 
the virtual queue analogy, this means the inflow rates A, of 
some queues are time-varying. 

This approach also opens up future work for multiple 
agents in 2-D or 3-D mission spaces. In a multi-agent frame- 
work, we can use the same model for uncertainty, but with 
a joint event detection probability function p(x,Si, . . . ,s n ), 
where there are n agents. This joint probability can be ex- 
pressed in terms of individual detection probabilities p(x,Si) 
as: p(x,si,...,s n ) = I -11^(1 - p(x,Si)). Although the 
optimal control problem can still be fully solved for multiple 
agents in the 1-D mission space, this problem quickly 
becomes intractable in higher dimensions. In this case, we 
aim to develop a unified receding horizon approach that 
integrates with our previous cooperative coverage control 
strategies [13]. 

VI. Conclusions 

We have formulated a persistent monitoring problem 
where we consider a dynamic environment with uncertainties 
at points changing depending on the proximity of the agent. 
We obtained an optimal control solution that minimizes 
the accumulated uncertainty over the environment, in the 
case of a single agent and 1-D mission space. The solution 
is characterized by a sequence of switching points, and 
we use an IPA-based gradient algorithm to compute the 
solution. We also discussed extensions of our approach using 
a receding horizon framework. Ongoing work aims at solving 
the problem with multiple agents and a richer dynamical 
model for each agent, as well as addressing the persistent 
monitoring problem in 2-D and 3-D mission spaces. 
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